-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nats] NATS backend rewrite, perf and stability improvements #194
Conversation
@brandond Quick question on desired However, in a clustered setup, e.g. three k3s servers, each will be watching, competing for leases, etc. so interleaved writes can definitely be attempted. NATS KV has a native Update operation that takes the "expected last sequence/version" of that key to effectively do a conditional write, to prevent last-writer-wins behavior, i.e. serializability per key. All that said, while watching the logs for which sets of keys get aggressively updated, it seems like it can be problematic, such as With the native etcd implementation in clustered mode, are updates serialized per key? Reads/watches are always going to be stale when a write is attempted, so it seems like if the loop is to attempt, then refetch, then retry, this would exacerbate the problem. |
Kubernetes uses an etcd transaction to ensure that the key revision is the same as what was expected. If the revision matches, then the key is written to. If it does not match, then the current value is returned, and the client gets a conflict error along with a copy of the current object state. The client is expected to merge their intended changes into the current object state, and retry. If NATS is not honoring this transaction behavior, then there is probably a significant potential for incorrect behavior. You can find the client's transaction at https://github.com/kubernetes/kubernetes/blob/3cf3702d1e638548ba9fcfd957ebae43e1a573cb/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L527-L533 |
Thanks for the clarification. To confirm, the previous implementation of the backend was not honoring the transactional behavior, but this patch does. |
There are a couple remaining items to clean up, but this is nearly complete. There are a number of changes, so I am happy to annotate the PR with comments on specific points of interest if that is preferred. |
As the resource version semantics had been missing from the old implementation, it makes me wonder whether Kube conformance tests are run on any of these backends, especially new ones. There are pretty extensive tests around watches and resource versions in there. They would likely have exposed the missing RV support. |
I agree and would have appreciated those from the get-go. For the previous implementation, it was not advertised as supporting HA mode formally and it also used mutexes to lock on the various key prefixes. This serialized writes, however the versions that were returned when certain operations occurred was not very clear. My understanding is that the CI runs a set of benchmarks, but I am not sure if this implicitly checks for conformance as well? |
The tests do no check for conformance, they only validate basic functionality. IFAIK Conformance checks are only done by QA with a full K3s release. |
The merging of #233 has likely caused merge conflict, sorry. |
Understood @dereknola and no worries with the conflict. FWIW, I did dig into the |
Looks like there are conflicts - can you rebase, and we'll see if the tests pass? |
@brandond I see the changes to the Watch API, we rebase and update the code today. |
Signed-off-by: Byron Ruth <[email protected]>
Signed-off-by: Byron Ruth <[email protected]>
The main new component is a btree cache to optimize List and Count operations. The List operation assumes and ordered set of results since a limit can be defined. A set of tests have been added. Support for embedded `dataDir` path and `credsFile` URL options have been added. Signed-off-by: Byron Ruth <[email protected]>
Signed-off-by: Byron Ruth <[email protected]>
Just an FYI, it's likely another NATS server release is going out tomorrow (some fixes along and time with the new Go releases). If we can delay a merge another day, I will update to this new release before getting incorporated into the next k3s release. |
watching here for an update - will merge once updated. |
2.10.5 is delayed to either tomorrow (Thursday) or Friday. Not sure if you want to move ahead and then I can open a PR to update to the dependency. There are some fixes relevant for the kine/k3s workload, but did not want to block your upstream release. |
I'm in no hurry to get anything out, there are no Kubernetes releases in November so we're not doing a K3s release that would even ship any of this until December. |
Sounds great, will update later this week. |
Signed-off-by: Byron Ruth <[email protected]>
Signed-off-by: Byron Ruth <[email protected]>
Signed-off-by: Byron Ruth <[email protected]>
Updated this to 2.10.5 (released yesterday), so ready to merge from my perspective. |
No description provided.